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1. INTRODUCTION 

Professor Ghosh has produced a very useful, inter- 
esting piece of work which (i) argues that Bayesian 
results with objective priors may be interesting for 
frequentist statisticians, (ii) reviews two useful (un- 
related) techniques which find application in the de- 
rivation of objective priors, (iii) introduces a fam- 
ily of divergence priors which is claimed to include 
reference priors, (iv) reviews matching priors, and 
(v) demonstrates that these ideas may produce new 
objective priors. I will comment in turn on each of 
these points. 

2. OBJECTIVE BAYESIAN STATISTICS 

Professor Ghosh states that "with enough histor- 
ical data, it is possible to elicit a prior distribution 
fairly accurately." I believe this is a (possibly mis- 
leading) overstatement, an example of wishful think- 
ing. In practice, useful prior elicitation is limited to 
small text-book models with very few parameters. 
I have never seen a proper elicitation job in mod- 
erately complex conventional models (say a logis- 
tic regression), let alone in really complex problems. 
In optimal circumstances, one may be able to elicit 
a proper joint prior for a couple of parameters of in- 
terest, but one is then forced to assume some form 
of objective conditional prior for the many nuisance 
parameters typically present in any real application. 
Some people then use a "flat" prior, typically a lim- 
iting form of some conjugate family of priors; but 
this is a very dangerous procedure, for one does not 
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control the implications of the choice made, and may 
result in severely biased, or even improper posteri- 
ors. There is simply no substitute for the search of 
a well-motivated objective prior. 

The author further states that "Bayesian meth- 
ods, if judiciously used, can produce meaningful in- 
ferences based on. . . objective priors" and makes ref- 
erence to several problems where frequentist meth- 
ods fail to produce sensible answers, while objec- 
tive Bayesian methods certainly succeed. I surely 
agree with this, but I find this to be an under- 
statement. Ever since Wald (1950) proved that to 
be admissible (a frequentist concept!) a procedure 
must be Bayesian, people have found, over and over 
again, that (as could have been expected from this 
general result) the frequentist performance of ob- 
jective Bayesian procedures is typically very good, 
and often better than that of the procedures de- 
rived from ad hoc frequentist methods. Actually, 
one could well invert the conventional teaching of 
mathematical statistics, by teaching first objective 
Bayesian methods (motivated from first principles), 
and then introducing frequentist ideas and proving 
that, under replication, objective Bayesian methods 
also perform very well. 

3. ASYMPTOTIC EXPANSIONS AND 
SHRINKAGE 

Theorem 1 is a very useful result. . . when it is ap- 
plicable. This essentially requires conditions for the 
posterior to be asymptotically normal, and we all 
know many important examples where this is not 
the case. It is conceivable that alternative asymp- 
totic expansion may similarly be obtained in those 
"nonregular" cases, and I would like Professor Ghosh 
to comment on this. 

The shrinkage argument introduced by J. K. Ghosh 
was a welcome addition to the mathematical statis- 
tician toolkit. It often provides an elegant, efficient 
procedure to obtain conditional expectations. This 
is another example of the power of techniques based 
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on working on sequences of priors based on compact 
sets, a procedure pioneered in the construction of 
reference priors, and developed in detail in Berger, 
Bernardo and Sun (2009), where these types of se- 
quences are used to derive reference priors in com- 
pletely general situations, with no assumptions of 
asymptotic normality. 

4. DIVERGENCE PRIORS 

Professor Ghosh recalls that in the original pa- 
per on reference priors (Bernardo, 1979), these are 
obtained by (heuristically) maximizing the expected 
KL divergence (better known as Shannon expected 
information), as the number of replications goes to 
infinity, and quotes a later result — actually published 
in Berger, Bernardo and Mendoza (1989), not in 
Berger and Bernardo (1989), where it is proven that 
maximization for a finite n may lead to a discrete 
prior. It may be worth it to point out that in refer- 
ence analysis one does not let the sample size n go 
to infinity, but consider k replications of the origi- 
nal experiment and let k go to infinity, which may 
be very different. Indeed, direct consequence 
of this, the reference prior may depend on the de- 
sign (two sample problems provide many examples 
of this situation; see Bernardo and Perez (2007) on 
the comparison of normal means for a relatively re- 
cent example). How is this implemented using the 
expansion techniques? 

Moreover, although the mathematical consequen- 
ces are very nice, the original reason to consider an 
infinite amount of replications was not mathemat- 
ical convenience, but first principles: one wants to 
find the prior that maximizes the missing informa- 
tion about the quantity of interest, and the complete 
missing information would only be attained by an 
infinite number of replications. 

The fact that (with only one parameter and un- 
der regularity conditions which guarantee asymp- 
totic normality) the missing information is maxi- 
mized by Jeffreys' prior for all the information mea- 
sures derived from a family of divergences which 
encompass both KL and Hellinger is reassuring, in 
that the result seems to be pretty stable with re- 
spect to changes in the definition of information. 
That said, we would argue that there are many in- 
dependent arguments (additivity, for one) suggest- 
ing that Shannon is the appropriate measure of in- 
formation in mathematical statistics. It follows that 
I am very suspicious of the properties of the priors 



derived by maximizing the expected chi-squared dis- 
tance. In particular, in the binomial case, I fail to 
see any reason to prefer a Beta(l/4, 1/4) prior over 
Jeffreys' Beta(l/2, 1/2) well justified from many (re- 
ally many!) points of view. May the author provide 
any such reason? 

The concept of general divergence priors concep- 
tually includes that of reference priors in that it 
uses a family of expected divergences which includes 
Shannon as a particular, limiting case. The specifics 
of the paper, however, exclusively refer to the rela- 
tively simple situation where asymptotic normality 
may be guaranteed. I would like to see some ex- 
amples of "nonregular" problems solved before con- 
cluding that the techniques described in this paper 
may be used in general. Nonregular problems were 
already solved in the original (Bernardo, 1979) for- 
mulation of reference priors, and have been rigor- 
ously analyzed in Berger, Bernardo and Sun (2009). 

5. PROBABILITY MATCHING 

As mentioned above, it is certainly interesting to 
analyze the frequentist properties of objective Baye- 
sian results, but one does not necessarily want to 
reproduce frequentist behavior. For instance, in the 
ratio of normal means problem mentioned by Profes- 
sor Ghosh in the Introduction, one certainly does not 
want to "match" the unacceptable coverage proper- 
ties of the conventional frequentist solutions. 

More importantly, I see invariance under repa- 
rameterization as a necessary prerequisite for any 
general procedure to derive objective priors, for the 
resulting (presumably objective) inferences cannot 
possibly depend on the arbitrary (and hence irrele- 
vant) parameterization chosen to formalize the prob- 
lem. It follows that, although it is certainly useful 
and important to study the eventual matching prop- 
erties of priors, I believe that requiring matching 
is not a sensible procedure to choose an objective 
prior. 

6. NEW OBJECTIVE PRIORS 

Professor Ghosh states "I believe very strongly 
that many new priors will be found in the future 
by either a direct application or slight modifica- 
tion of these tools," and I agree that this is indeed 
quite plausible. However, a new objective prior is 
not something necessarily a good prior. One needs 
objective priors which satisfy a number of desider- 
ata: general applicability, appropriate marginaliza- 
tion properties, invariance, strong consistency, and 
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so on [see Bernardo (2005), for a general discussion]. 
And new priors which do not satisfy those desider- 
ata should probably not even be considered. Some 
would say that "the proof of the pudding is in the 
eating"; that may be so, but then I would like Pro- 
fessor Ghosh to quote at least one convincing exam- 
ple where he would propose to use an objective prior 
which is not a reference prior. 
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